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Abstract 

The latent multinomial model (LMM) model of Link et al. (2010) provided a general 
framework for modelling mark-recapture data with potential errors in identification. 

Key to this approach was a Markov chain Monte Carlo (MCMC) scheme for sampling 
possible configurations of the counts true capture histories that could have generated 
the observed data. This MCMC algorithm used vectors from a basis for the kernel of 
the linear map between the true and observed counts to move between the possible con¬ 
figurations of the true data. Schofield and Bonner (2015) showed that a strict basis was 
sufficient for some models of the errors, including the model presented by Link et al. 
(2010), but a larger set called a Markov basis may be required for more complex mod¬ 
els. We address two further challenges with this approach: 1) that models with more 
complex error mechanisms do not fit easily within the LMM and 2) that the Markov 
basis can be difficult or impossible to compute for even moderate sized studies. We ad¬ 
dress these issues by extending the LMM to separately model the capture/demographic 
process and the error process and by developing a new MCMC sampling scheme using 
dynamic Markov bases. Our work is motivated by a study of Queen snakes {Regina 
septemvittata) in Kentucky, USA, and we use simulation to compare the use of PIT 
tags, with perfect identification, and brands, which are prone to error, when estimating 
survival rates. 

Keywords: Bayesian Inference; Markov basis; Markov chain Monte Carlo; Mark-recapture; 
Misidentification; Queen snake {Regina septemvittata) 

1 Introduction 

Standard models for data from studies of marked individuals require that researchers are 
able to identify marked individuals uniquely and without error. However, these assump¬ 
tions may be violated in many ways. Researchers may misread marks and provide partial 
identihcations based on visual sightings or poor quality photographs (McClintock et al., 
2014; Morrison et al., 2011), allelic dropout may lead to incorrect identihcations from DNA 
samples (Lukacs and Burnham, 2005; Wright et al., 2009; Yoshizaki et ah, 2011), man-made 
tags may be lost or degrade (Cowen and Schwarz, 2006), and natural marks may evolve over 
time (Yoshizaki et ah, 2012). Link et al. (2010) described a new framework to allow for 
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misidentification in mark-recapture data or, in their words, data that present a “mangled 
and incomplete summary” of the true capture histories. The framework depends on writ¬ 
ing the observed capture history frequencies as a linear function of the true frequencies, 
described through the equation x = An, and was called the latent multinomial model 
(LMM) (Link et ah, 2010). Key to htting this model was a novel Markov chain Monte Carlo 
(MCMC) sampling algorithm that used a subset of elements in the kernel of A, ker(A), to 
generate proposals for n in a Metropolis-Hastings update step. In particular. Link et ah 
(2010) draw elements from a basis for ker(A). 

Schofield and Bonner (2015) showed that more care may be needed to ensure that the 
resulting Markov chains are irreducible and cover the entire sample space. Link et al. (2010) 
focused on one model of errors in closed populations, M^, and provided details of the specihc 
basis used in the algorithm only for T = 2 capture occasions. However, the discussion 
implied that the algorithm could be implemented with any basis of ker(A) and could be 
applied whenever the observed counts can be written as a linear function of latent counts. 
Schoheld and Bonner (2015) proved that the specihc basis chosen for when T = 2 
does produce irreducible chains, but they showed that this is not true for all bases. They 
also provided examples of models with more complex error for which no set of linearly 
independent elements ker(A) can produce irreducible chains. To address these issues they 
hrst provided a method for constructing a basis that is guaranteed to produce irreducible 
chains when the errors form simple corruptions (i.e., they corrupt the capture history for 
a single individual). This includes model Mta- Furthermore, they showed that algorithms 
for more complex models can be constructed by adding elements to the basis to construct 
a so called Markov basis which guarantees that the resulting Markov chains are irreducible 
(Diaconis and Sturmfels, 1998). 

Although the use of Markov bases seems to address the problems with more complex 
models, two challenges remain. First, it can be difficult to describe the distribution of the 
latent frequencies when errors affect multiple individuals. This makes it difficult to cast the 
model in the LMM framework. Second, the approach of computing Markov bases directly is 
impractical for realistic mark-recapture data sets. General methods for computing Markov 
bases have been developed and implemented in software packages like 4ti2. However, the 
Markov bases for these models are so large or complex that they cannot be computed with 
current hardware. For the specihc model we present in Section 3, 4ti2 exceeded the memory 
on a computer with 8 GB of RAM when the experiment contained 5 capture occasions or 
more. 

We hrst develop an extension of the LMM that separates the models of the population 
demographics/capture process and error process by introducing a second set of latent counts 
(Section 4), and then describe a new MGMG algorithm using dynamic Markov bases to 
avoid explicit computation of the Markov bases (Section 5). Our work is motivated by a 
study of queen snakes {Regina septemvittata) in Jessamine Gounty, Kentucky. Since 2013, 
snakes have been marked with subdermal passive integrated transponder (PIT) tags, and 
this method has several advantages. Identihcations from PIT tags are almost 100% reliable 
and PIT tags can be detected from a distance (up to 42 cm) so that snakes can be identihed 
without physical capture and displacement of habitat (see Gonnette and Semlitsch, 2012). 
The use of PIT tags may also increase the detection rates, but PIT tags are expensive costing 
$5 per tag plus almost $3000 per receiver. As an alternative, snakes may be marked with 
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unique brands applied with handheld medical cautery units (Winne et ah, 2006). The cost 
of brand snakes is negligible, but branded snakes must be physically captured to be identihed 
and brands are easy to misread. The extensions of the LMM we provide would allow us to 
model brand data while accounting for potential errors. We use simulations based on the 
available PIT tag and brand data to assess the impacts that these errors would have on the 
estimation of survival rates and discuss how these extensions can be applied to a broader 
range of models. 

2 The Latent Multinomial Model 

The LMM of Link et ah (2010) accounts for possible errors in the data by recasting the mark- 
recapture model. Suppose that / different capture histories could be observed during the 
study. If the population is homogeneous then the /-vector of counts, n, recording the number 
of times each history was observed is a sufficient statistic. Unfortunately, the distribution of 
n depends on both the mark-recapture and error processes and may be difficult to compute 
directly. To make the likelihood tractable, the LMM introduces a set of J > I latent histories 
which identify the true captures for each individual and describe what errors occurred. Let 
X be the unobserved J-vector of counts for these latent histories. The like lih ood can be 
dehned by summing probabilities over all values of x consistent with n. In particular, the 
LMM assumes that there is a linear relationship between n and x so that n = Ax for some 
known I x J matrix A. The likelihood can then be computed as 

7r(n|0) = l(n = Ax)7r{x\6) = 7r{x\9) (1) 

cceN-^ xeTn 

where !(■) is the indicator function and 6 the vector of parameters, Tn = {x ^ : n = Ax} 

is the inverse image of n (called the n-£bre in algebraic statistics), and N is the set of natural 
numbers including 0. 

As an example. Link et al. (2010) considered an extension of the time-dependent, closed 
population model Mt of Otis et al. (1978) called Mta- This model was hrst described by 
Yoshizaki et al. (2011) and makes two assumptions regarding errors: 1) that captured in¬ 
dividuals are correctly identihed with probability a independent of all other events and 2) 
that the identities resulting from errors are unique and do not match the marks of other 
individuals in the population or the identities generated by previous errors. The second 
assumption implies that each error removes a single observation from one individual’s true 
capture history and produces a new observed history containing a single capture event. 

Although equation (1) may make it easier to compute the likelihood function in theory, 
is often so large that exact computation is not practical. Instead, Link et al. (2010) proposed 
to sample from the joint posterior distribution of both the latent vector, x, and the model 
parameters, 6. The specihc MCMC algorithm uses a block Metropolis Hastings (MH) ap¬ 
proach and the primary challenge lies in constructing proposals oi x\6 which are likely to lie 
inside the hbre. The algorithm starts by dehning a lattice basis for the kernel of A; that is, a 
set B = {bi ,..., bx] such that any b in ker(A) with integer entries, b G ker(A) f) can be 
written as a linear combination of the elements of B with integer coefficients ci,. .., G Z. 
Integer multiples of the elements from the lattice basis are then added to the current value 
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of X one-at-a-time to generate new proposals that are accepted or rejected before continuing 
to the next element. The magic of this approach is that A(x + cb) = Ax for any x G Tn, 
b G ker(A), and c G Z so that the proposal a:;(P^®P) = x + Ckb^ also has integer entries and is 
guaranteed to satisfy the linear constraint. Note, however, that a;*^P^°P) may still fall outside 
J^n since there is no guarantee that its entries will all be non-negative. 

Link et al. (2010) implied that Markov chains constructed with this algorithm would 
connect all elements in and hence be irreducible. Schoheld and Bonner (2015) showed 
that this is true for model Mta provided that the right lattice basis is chosen and extended this 
result to a broader class of models in which errors constitute simple corruptions. However, 
they also provided examples of more complicated models for which the algorithm does not 
produce irreducible Markov chains. The central problem is that some pairs of elements in 
J^n may be connected by the algorithm above only by passing through values of x containing 
negative entries. Since these elements he outside of J^n and have zero probability under the 
posterior, the chain will never be able to move between Xi and X2 and will not be irreducible. 

Irreducible chains can always be produced by adding linear combinations of all elements 
in B simultaneously instead of adding elements one at a time, but the resulting proposals 
are likely to contain negative entries and Diaconis and Sturmfels (1998) reported that this 
method is not efficient. Instead, Diaconis and Sturmfels (1998) suggested using the one-at-a- 
time algorithm but drawing the elements from a larger subset A4. C ker(A) chosen to ensure 
that it is possible to move between any two elements of Diaconis and Sturmfels (1998) 
called A4. a Markov basis and the elements of A4. moves. We consider the special case of the 
algorithm presented by Schoheld and Bonner (2015) in which one element is selected from 
the Markov basis on each iteration of the MCMC algorithm and either added to or subtracted 
from the current conhguration without a multiplier. Details are given in Algorithm 1 . 

Dehne a Markov basis, AT. 

Initialize 6f\ 02^\ and x^^^ so that n = Ax^^^^ 

Set k = 1. 

1) Update 0 conditional on Call the result 0 ^), 

2) Update x conditional on 0 ^) 

a) Select 6 G AT and c G {—1,1} 

b) Set ccP^P = ajA-i) _)_ cb. 

c) Calculate the Metropolis acceptance probability: 

r 7r(n,a;P™P|6>(^)) q(x^^-^'>\xP^^P)] 

\ ’ 7r(n, a;U-i)|0(*:)) g(a;P™P|a:;A-i)) j 

where q{x'\x) is the probability of proposing x' given the current state x. 

d) Set x^^'> = a^P^P with probability r. Otherwise, set x^^'> = . 

3) Increment k. 

Algorithm 1: MCMC algorithm for sampling from the joint posterior distribution of 6 
and X given a hxed Markov basis, AT. 
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3 Data and Models 


The data we consider comes from a study of queen snakes conducted along Little Hickman 
Creek in Jessamine County, Kentucky. An initial sample of 61 snakes was captured and 
marked in the fall of 2013 and a second sample of 41 snakes was marked in the spring of 2014. 
All snakes were implanted with PIT tags and a subset of 73 snakes were also branded with 
unique marks as described in Winne et al. (2006). In the summer of 2014, two technicians 
visited the site to locate and identify snakes approximately every two weeks. On each visit 
the technicians conducted searches using a PIT receiver and attempted to physically capture 
any snakes that were detected so that their brands could be read. The 102 snakes were re¬ 
encountered 191 times in total, an average of 1.87 per snake. The researchers conducting the 
study are primarily interested in modelling the survival and movements of the snakes in this 
population and in understanding the individual and population level impacts of snake fungal 
disease, am emerging pathogen about which little is currently known (Allender et ah, 2013; 
Sleeman, 2013). For illustration, we focus on modelling the snakes’ apparent over-wintering 
survival, the probability that a snake marked in the fall of 2013 is still in the population in 
2014. 

Previous studies have found that snakes may expel PIT tags (e.g. Roark and Dorcas, 
2000) and some loose tags were found at the study site. However, we believe that the rate of 
expulsion is small and there is no reason to think that PIT tags are ever misidentihed. With 
these assumptions capture histories formed using the PIT tag encounters can be modelled 
with standard Cormack-Jolly-Seber type models ignoring potential identihcation errors or 
tag loss (see Lebreton et ah, 1992; Seber, 2002; Williams et ah, 2002, and references therein). 
On the other hand, the brands can be difficult to read and the identification of physically 
captured snakes is prone to error. A total of 9 branded snakes were recaptured physically 
during the summer of 2014. By comparing with the PIT tag records we knew that the 
technician who had originally branded the snakes identified 8 of 9 (89%) correctly while 
the second technician identified only 6 of 9 (67%) correctly. The small number of physical 
recaptures did not allow us to compare results based on the PIT tag and brand data directly. 
Instead, we examine the feasibility of branding snakes by analyzing simulated brand data 
generated with error rates matching those observed from the two technicians. 

The specific model we consider both for generating and analyzing the simulated brand 
data combines the standard Cormack-Jolly-Seber (CJS) model for the capture process and 
the band-read error (BRE) model defined Schofield and Bonner (2015). Suppose that re¬ 
searchers visit a location on T occasions. On each visit they capture a number of un¬ 
marked individuals, mark them, and return them to the population. At the same time, 
the researchers also conduct visual surveys to identify previously marked individuals. The 
assumptions of the BRE model are: 

1. Resighted individuals are correctly identified with probability a on each occasion, 

2. Errors cause one marked individual to be misidentified as another marked individual. 

3. Each individual can only be involved in one event on any one occasion. In particular, it 
is not possible to resight individual i and to mistake another individual for individual 
i on the same occasion. 
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Assumption 2 contrasts directly with the assumptions of model and is justified by the 
differences between man-made marks and natural marks. Model is intended for use with 
natural marks including genotypes and pigmentation patterns. The set of possible natural 
marks is usually unknown and the number of possible marks is so large that it unlikely for 
an error to reproduce the identity of another individual exactly. On the other hand, the 
BRE model is intended for use with man-made marks. The set of possible marks is known 
when using man-made marks, and this means that erroneous sightings of marks which have 
never been released can be detected and removed from the data prior to the analysis. The 
only errors that cannot be detected occur when one marked individual is mistaken for another 
marked individual. The third assumption simplihes the model and we plan to relax this in 
future work. Details on the likelihood for this model are provided in Section 4 after we 
introduce the new modelling framework. Note that erroneous sightings of marks that were 
released is only appropriate when estimating survival, as we do here, and leads to biased 
estimates of abundance (see White and Shenk, 2001; McClintock et ah, 2014). 

4 Extended Framework 

The framework of Link et ah (2010) focused on models in which x follows a multinomial 
distribution. This is the case for model Mta and although they suggested that the methods 
could be applied more generally examples were not provided. The CJS/BRE model does 
not result in a multinomial distribution for and it is difficult to determine the density of 
X for the CJS/BRE model explicitly, making it hard to apply the LMM directly. 

To address this, we extend the LMM to include a second vector of latent counts which 
allows the mark-recapture process and the error mechanism to be modelled separately. Sup¬ 
pose, for example, that an experiment has T = 2 occasions and individual i is captured on 
both occasions, correctly identihed on the first occasion, and identihed as an entirely new 
individual on the third occasion (this is the error mechanism for model Mta)- In the terminol¬ 
ogy of Link et ah (2010), individual i would have latent history i/t = 12 and would produce 
the recorded histories lju = 10 and u}i 2 = 01. The original LMM assigns probabilities to the 
latent histories, Ui, directly by simultaneously modelling the capture and error processes. 
Our formulation introduces a second latent history, ^i, identifying the occasions on which 
the individual was truly captured but ignoring the errors. The new latent history would be 
^ = 11 since the individual was truly captured on both occasions. We then model the joint 
distribution of Ui and by assigning probabilities first to and second to Ui given We 
distinguish between the two sets of latent histories by calling Ui the latent error history and 
the latent capture history. 

Generally, we let n be the /-vector of counts for the observable histories, x the J-vector 
of counts for the latent error histories, and z the //-vector of counts for the latent capture 
histories. As in Link et ah (2010), we assume that n = Ax for some known matrix A. 
Further, we assume that z = Bx for some known matrix B. The complete data likelihood 
is then constructed in two stages: 1) modelling the process of capturing, marking, and 
recapturing individuals to dehne 7r(z|0) and 2) modelling the error process conditional on 
the true captures to dehne 7t{x\z,6). We expect that the parameters in these components 
will be separate so that we can represent them by the disjoint sets 6 i and 62 . The posterior 
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distribution of the complete data and parameters is 


7r(cc, z, 01, 02|^^) oc l(n = Aa;)7r(a;|z, 02)7r(z|0i)7r(0i)7r(02) 


where vr(0i) and 7r(02) represent priors assumed to be independent. When considering a 
specihc history (observed or latent) we index the vectors of counts both by index and by 
name. For example, n* represents the count for the ith observable history, using some implicit 
ordering, while represents the count of history uj. A table summarizing the notation is 
provided in Appendix A. 

To £t the CJS/BRE into the extended framework we need to 1) identify the sets of 
observable histories, latent error histories, and latent capture histories; 2) construct the 
constraint matrices; and 3) define the components of the likelihood function. Conditioning on 
the hrst release we can exclude both the unmarked individuals and the individuals marked on 
the hnal occasion from the likelihood. This leaves I = 2^ — 2 observable histories containing 
the events 0 and 1 excluding the zero history and the history ending with a single capture. 
The latent capture histories also belong to the same set so that iC = 2^ — 2 as well. In 
dehning the latent error histories, four events can occur on each occasion after marking. The 
Ah individual may be not resighted (event 0), resighted and correctly identihed (event 1) or 
resighted and incorrectly identihed (event 2). Another marked individual may be captured 
and incorrectly identihed as individual i (event 3). Events 2 and 3 represent false negative 
and false positive resightings. 

Next, we construct the constraint matrices A and B. One factor that makes the 
CJS/BRE model more complicated than model Mta is that it contains constraints on x 
beyond those imposed by the observed counts. In particular, 7r(cc|z, 62 ) > 0 only if the num¬ 
ber of false positives and false negative captures are equal on all occasions. The A matrix 
is constructed as 



where Ai is a (2^ — 2) x J matrix modelling the relationship between x and n that is dehned 
similar to the matrix A' in Link et ah (2010), and A2 is a (T — 1) x J matrix constraining 
the number of false positives and negatives on the hnal T — 1 occasions. Mathematically, 


Alp — 


1 if ojit = t{ujt = 1) J- t{ujt = 3) for alH = 1,..., T 
0 otherwise 


and 


A2P - 


-1 if z/j-i+i = 2 

1 if = 3 
0 otherwise 


The tth row of A2 computes the diherence between the number of 2s and 3s in the latent 
error histories, and the vector n must also be extended by concatenating T — 1 extra Os 
corresponding to the added constraints. The B matrix is dehned such that Bjk = 1 if the 
jth latent capture history has the same pattern of captures as the kth latent error history. 
That is 


Bjk — 


1 if ^kt = = 1) + = 2) for alH = 1,..., T 

0 otherwise 
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Finally, we define the distributions of 2 and xjz. Let at denote the number of individuals 
hrst captured and marked on occasion t, Mt the number of individuals marked before occasion 
t, and rrit the number of these individuals resighted on occasion t. The density of z is a 
product multinomial 

^(^ 1 ^ 1 ) = Ylpri^kWiY^ 

Uk=i^kltJi 

where pr{^k\Gi) denotes the probability assigned to history by the CJSmodel. To construct 
the second component of the likelihood we consider occasions t = 2,... ,T separately hrst 
modelling the number of errors that occur, denoted by Et, and then modelling the exact 
conhguration of false positives and false negatives. The assumptions of the BRE model 
imply that Et < m* = min(mt, Mt — rrtt) and so we model Et according to the (possibly) 
truncated binomial with density 

pr[Et = et\z,a) oc (1 - Ci = 0,..., m)'. 

where a is the probability of a correct identihcation. We assume that all assignments of 
false positives and false negatives are equally likely conditional on Et. There are (’^() and 
ways to select the false negatives and false positives and it follows that 


pr{x\E2, ...,Et,z) 



-1 


The second component of the likelihood is 


(1 - 

.(tS)‘) (T)(i - 

where et{x) = = 2) represents the number of errors in conhguration x. The 

initial term accounts for the many relabellings of the marked individuals that would produce 
the same counts in x and z. 


nf=i Zki 

n 


7r{x\z,a) = l(z = Bx] 


n 


j=l t=2 


5 Dynamic Markov Bases 

As in Link et al. (2010), the main difficulty in sampling from the joint posterior distribution 
of 6>i, 02 , X, and z lies in generating proposals for x that are inside En with high probability. 
The addition of the second vector of latent counts does not complicate matters because z 
is a deterministic function of x. This means that consistent proposals for x and z can be 
constructed jointly by generating a proposal for x and then setting z = Bx. 

We initially tried to construct Markov bases with the software package 4ti2 which uses 
general methods based on the theory of toric ideals to compute Markov bases. Unfortunately, 
the Markov bases for the CJS/BRE model grow so quickly with T that they cannot be 
computed for studies of a reasonable size. For T > 5, 4ti2 ran out of memory on a computer 
with 8 GB of RAM before completing the calculations. 









To avoid this problem we make use of dynamic Markov bases. Dobra (2012) defined 
a dynamic Markov bases to be a collection of sets of local moves, Ai(x), which connect 
each X G J-n to a relatively small number of neighbours. On the kth iteration of the 
MCMC algoirthm a proposal is generated by sampling a move from This avoids 

the need to compute the entire Markov basis and increases the efficiency of the algorithm 
because the local moves are more likely to generate proposals inside the hbre. 

Dobra (2003) initially proposed a method for dynamically generating moves for the prob¬ 
lem of sampling from a table of arbitrary dimension with constraints on some marginal sums. 

In particular, they showed that the set of primitive moves containing two Is and two -Is 
forms a Markov basis for such models and developed an algorithm for dynamically selecting 
moves from this basis. Dobra (2012) generalized this to the problem of sampling tables 
of arbitrary dimension constrained by hxing some marginal sums and also placing bounds 
on individual cell entries. The CJS/BRE models does not £t into the framework because 
the constraints on the number of false negatives and false positives do not correspond to 
marginal sums or bounds on individual cells. However, a dynamic Markov basis containing 
only primitive moves can still be constructed as follows. 

Dehne Xvt{,x) = {i/ : ut = v and > 0} to be the subset of latent error histories with 
event v on occasion t and positive entries in x. The moves 'm.M.{x), denoted by 6(1^0,1^1, *^2, n's), 
each contain two -Is associated with histories uq G Xqi^x) and G <Tit(a;) and two Is as¬ 
sociated with histories 1^2 ^ and 1^3 G Xzt{x) for some common t. The moves can be 

dehned by drawing pairs of elements from any two of the four sets, but it makes most sense 
to consider drawing the histories from Xot{x) and Xit{x) and then constructing the histories 
in X2t{x) and X^t{x) or vice versa. Given histories uq G XQt{x) and ui G Xit{x) the histories 
1^2 £ ^ 2 t{,‘x) and U 3 G X^t^x) are constructed by setting 


n2i = 


I/Qs a s 

2 ii s = t 


and z/ 3 f 


U 3 t ii s 

3 if s = t 


More compactly, 1^2 = + 2St and 1^3 = 1^2 + 2St where 5t represents the J-vector with a 

single 1 in entry t. The entry of h(^'o, 1 ^ 2 , 1 's) associated with the latent error history jz is 




—1 ii u = uq OT u = Ui 

1 if 1 / = 1/2 or 1 / = 1/3 

0 if Otherwise 


( 2 ) 


Alternatively, the same set of moves can be obtained by pairing all 1^2 G X 2 t{x) and 1^3 G 
X 3 t{x), setting vq = U 2 — 2 St and i^i = 1/3 — 2 dt, and defining the entries h(i/o, JZi, 1 ^ 2 , 1 ^ 3 ) 
exactly as above. Heuristically, the operation x + c b{uo, ui, U 2 , 1 ^ 3 ) adds c errors when c > 0 
and c removes errors when c < 0 . 

For each x G iFn , M{x) is the union of two sets of moves constructed as in equation (2): 


M.i[x) = {b{uQ, i^i, U2,1^3) : Vq G XQt{x) and Ui G Xit{x) for some t} 


and 

M2{x) = {b{uQ, 1>1, 1/2, V3) : U2 G X 2 t{x) and 1/3 G X 3 t{x) for some t}. 

The hrst set contains all moves that incorporate new errors when added to x while keeping 
the proposal inside iFn- The second contains all moves that remove errors when subtracted 
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from X and again keep the proposal inside J^n- On the kth iteration a proposal for x is then 
generated by sampling c G { — 1,1}, selecting 

if c = —1 
if c = 1 ’ 

and setting = x^^~^'> + cb. Full details are provided in Algorithm 2. That the algorithm 
connects all elements in is given by Theorem 1. 

Theorem 1. Let Xi,X 2 G J^n- Then there exists a sequence of moves 6i,..., 6^. ond coeffi¬ 
cients Cl,..., Ci G { — 1,1}, for some L, such that X 2 = Xi-\- 'Yld=i + J^iLi ^ -Tn; 

and 

bie M Ixi-^^bi 

\ 1=1 

for all L' = 1,..., L. 

The proof in Appendix B shows that any element in can be connected to the unique 
conhguration with no errors, denoted by x^, by subtracting elements from M 2 {x) to remove 
errors one at a time. Any other element in can then be reached by adding elements from 
AA.i{x) to add errors one at a time. 

One further advantage of this approach is that moves can be sampled from Ad (cc) without 
ever having to compute this set explicitly. If c = 1 then b G Adi(a;) can be drawn by: 1) 
selecting uq G 2)sampling s G {t : z/qi = 2}, and 3) selecting Vi G A’is(cc). 

If c = —1 then b G Aii{x) can be drawn by: 1) selecting 1/2 G Ut=i'^ 2 t (®)5 2) sampling 
s E {t : h' 2 t = 0}, and 3) selecting 1/3 G X^g^x). In either case the proposal density, q{x'\x), is 
proportional to the inverse of the product of the cardinalities of the sets in each of the three 
steps. It is possible that the one of three sets may be empty. This happens if we try to add 
an error when no correct identifications are available (if c > 0) or to remove an error when 
no errors exist (c < 0). In these cases we retain x^^~T with probability one and continue to 
the next iteration. 

Although we were not able to compute a minimal Markov bases for the CJS/BRE model 
with the software package 4ti2 we can construct a full Markov basis as the set of all moves 
that add or remove errors. Samples from the joint posterior distribution of 6 , x, and z 
for the CJS/BRE model could then, in principle, be generated using this Markov basis in 
Algorithm 1 with an additional step to compute as in Algorithm 2. However, 

this Markov basis is so large that the approach is entirely impractical. It contains more than 
6.41 X 10^° elements when T = 10. A matrix of this size would require 128 GB of memory 
even when stored as two lists of indices identifying the Is and -Is, and it is highly unlikely 
(almost impossible) that a randomly selected move would produce a proposal inside 



6 Computational Efficiency 

To illustrate the gain in efficiency from using our dynamic Markov basis we present results 
from analyzing a single simulated data set with T = 4 capture occasions (the largest number 
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Initialize 02°^ so that n = Aa;® and = Bx^^\ 

Set k = 1. 

1) Update 01 and 02 conditional on and Call the results 0|^^ and 6^\ 


2) Update x conditional on 0^^^ and 6^'^ as follows. 

a) Sample c e {—1,1} and b G M{x^'‘~^^). 

b) Set a;P™P = + cb and 2 P™p = SccP^p. 

c) Calculate the Metropolis acceptance probability: 



where q{x'\x) is the probability of proposing x' given the current state x. 

d) Set x^^'> = ajP^P and z^^'> = zP™p with probability r. Otherwise, set 
and z^'^'> = 

3) Increment k. 

Algorithm 2: Proposed algorithm for sampling from the posterior distribution of 0i, 02, 
X, and z using dynamic moves. 

for which we can compute the Markov basis using 4ti2). Data was generated for a sample 
of 30 individuals, 10 released on each of the first three capture occasions, with constant 
survival probability 0i = 02 = 03 = -8, constant capture probability P2 = Ps = Pa = -5, and 
error rate a = .5. Samples from the joint posterior distribution of x and z were then drawn 
using both the Algorithm 1 which uses the hxed Markov bases, modihed for the extended 
framework as described in Section 4, and Algorithm 2 which generates moves dynamically. 
The parameters, 0, were hxed at their true values. 

To assess how well the two chains mixed we compared the acceptance rates and the 
number of unique solutions for x identihed per accept/reject step. The chain constructed 
using Algorithm 1 identihed a total of 79 unique conhgurations among the 7,500 values of x 
sampled after the burn-in phase. Less than 1% of the proposed conhgurations were accepted. 
In comparison, the chain constructed with Algorithm 2 identihed 2548 unique conhgurations 
and 38% of the proposed conhgurations were accepted. Figure 1 also provides traceplots of 
the number of errors in the conhgurations sampled by the two chains on each accept/reject 
step. These summaries all make it clear that the chain constructed from Algorithm 2 is 
mixing and moving through the hbre much more quickly than the chain constructed from 
Algorithm 1 1. 


[Figure 1 about here.] 


7 Results 


In our analysis of the queen snake data we ht an initial CJSmodel to the original PIT tag data 
(Model 1). We then simulated data mimicking what might be observed from the branding 
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data by adding errors in the PIT tag data according to the BRE model using the observed 
identihcation rates, a = 8/9 and a = 6/9, and reht the CJSmodel to each data set to assess 
the effects of errors that are not modeled (Model 2). Finally, we fit the BRE model to the 
corrupted data sets using our MCMC implementation with dynamic moves (Model 3). The 
data for all models included 10 capture occasions: the marking session in 2013, the marking 
session in spring of 2014, and the 8 visits to resight/recapture snakes in the summer of 2014. 
One hundred simulated data sets were generated for each value of a. 

[Figure 2 about here.] 

Parameters for the data generating models were based on analysis of the original PIT 
tagging data. Analysis of the PIT tag data using classical, maximum likelihood methods 
in Program MARK (White and Burnham, 1999) strongly supported a simplified CJSmodel 
which allowed the capture probabilities to vary independently across all occasions but con¬ 
strained survival to be equal across the final 8 occasions. Specifically, the estimated survival 
probabilities were = 0.66, 02 = 1-00, and 03 = ... = 09 = 0.93. The first of these 
represents the overwintering survival, the probability that a snake marked in 2013 is still 
alive in the study area in the summer of 2014, and is of primary interest. The capture and 
survival probabilities were assigned uniform prior distributions in all models. When fitting 
Model 2 and Model 3 we assumed that a was known. The correct identification rate would 
need to be estimated in practice but the data are likely contain little information about this 
parameter, as noted by Link et al. (2010). To avoid problems with identifiability Link et ah 
(2010) assign a an informative 0(19,1) prior distribution which places 95% of its mass be¬ 
tween 0.85 and 1.00. Alternatively, one could use double observers, repeated identifications, 
or double marks to estimate a directly. 

Figure 2 summarizes the posterior distributions for the survival probabilities obtained 
from the three models. In particular, we compare the bias of the posterior means and the 
width and coverage of the central 95% credible intervals. The coverage of 02 is zero for all 
models and is not reported because the parameter lies on the boundary of the parameter 
space. 

The most important differences are for the estimation of the overwintering survival rate, 
01 . Fitting the data without errors (Model 1) provides estimates of all three parameters 
that are almost unbiased. Coverage probabilities of 0i and 03 are also above the nominal 
value. Ignoring the errors and fitting the same CJSmodel (Model 2) produced very poor 
results. When a = 8/9 the estimated bias in the overwintering survival rate was 0.15 
(23%) and the coverage of this parameter was only 47%. The bias increased one and a 
half times to 0.24 (36%) when the correct identification probability decreased to a = 6/9 
and the coverage dropped to only 6%. Although posterior means of 0i and 03 from Model 
2 were not significantly biased, the credible intervals were too narrow and the coverage of 
03 was still very low when a = 8/9. In comparison, point estimates of 0i and 03 from 
the CJS/BRE model (Model 3) were negligibly biased for both levels of error. Coverage of 
these parameters was almost identical for Models 1 and 3 and exceeded the nominal rate. All 
models produced biased estimates of 02 , which is not surprising given that the true parameter 
lies on the boundary of the parameter space. The posterior mean of 02 from Model 3 was 
significantly more biased than that of Model 1, underestimating 02 by 9% when a = 8/9 and 
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16% when a = 6/9. This is due to there being more signihcant shrinkage toward the prior 
mean of 0.50 when there is more uncertainty in the data. As expected, credible intervals 
from Model 3 wider than those from Model 1 to account for the extra uncertainty introduced 
by the errors. 

8 Discussion 

The results in Section 7 clearly illustrate the problems with misidentihcation. The overwin¬ 
tering survival probability was overestimated by 23% when the error rate was low and 36% 
when the error rate was high. In comparison, estimates from the CJS/BRE model were al¬ 
most unbiased and the credible intervals had above nominal coverage. The extra uncertainty 
in the error does increase the posterior variances and so the obvious recommendation is to 
reduce error rates experimentally by using marks that are easier to identify, double tagging 
individuals, or using multiple observers. Uncertainty could also be reduced pairing observers 
to decrease the error rates or simply by increasing the number of hours spent in the held to 
raise capture rates. Of course, these options would require more man power and increase 
the cost of branding as a method of marking individuals. We are currently assessing these 
options to hnd the optimal method of studying the queen snake population. 

We have described our methods specihcally for the CJS/BRE model, but we believe that 
the methods should be applicable to broad range of mark-recapture models with possible 
identihcation errors. The extended framework described in Section 4 can incorporate more 
complex models of both the capture and error processes than the original LMM and is 
particularly useful when the distribution of the joint histories described by the combining 
processes is complicated or intractable. The algorithm based on dynamic Markov bases 
presented in Section 5 essentially entails moving through the hbre, Tn, by adding or removing 
errors one at a time, and we expect that the same procedure can be applied widely. There 
are two important caveats. First, it must be possible to write the model in terms of the 
two linear constraints described in the extended framework. This will not always be the 
case and does not happen if we extend the BRE model so that individual i can be captured 
on occasion t and another individual can be captured and identihed as individual i at the 
same time. We are working to extend these models to allow for such events. The second 
caveat is that the Markov chains derived from the new algorithm may not be irreducible if 
the posterior distribution assigns probability zero to some elements in the hbber. This might 
occur if certain conhgurations of the errors can be ruled out a priori 

Another issue that remains to be addressed is the level of connectivity and efficiency 
of the chains produced with the different algorithms. Any Markov basis is guaranteed to 
connect all elements in iFn for any n, but the chains produced by different algorithms will 
connect the hbre in different ways and the result chains will mix more or less well. As 
an extreme example, a certain Markov basis may divide the hbre into two parts each with 
many internal connections but with only one connection between. In this case the hbre is 
connected but any Markov chain generated by this basis will tend to remain in one of the 
two parts and mix poorly. These issues are complex and there is very little literature within 
the held of algebraic statistics providing metrics to compare the connectivity for diherent 
Markov bases or assessing the ehects on efficiency. We hope that further developments in 
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this area will provide guidance on choosing between Markov bases that we can apply to the 
problem of misidentihcation in mark-recapture studies. 

A Notation 

Algebraic Statistics and Markov Bases: 

Tn n-fiber: Tn = {x G : n = Ax}. 

B Lattice basis for ker(A). 

Ai Markov basis for ker(A). 

Ain Markov subbasis for ker(A) given n. 

Ai{x) Dynamic Markov basis for ker(A) computed at x. 


Extended LMM : 

(jJi Observed capture history for the marked individual. The 

subscript is dropped when computing probabilities for a gen¬ 
eral history. 

Ui Latent error history for the marked individual. 

Latent capture history for the marked individual. 

n Known vector of counts for the observed histories (indexed 

by i and uj). 

X Unknown vector of counts for the latent error histories (in¬ 

dexed by j and u). 

z Unknown vector of counts for the latent capture histories 

(indexed by k and $,). 

I Length of n. For the CJS/BRE model I = 2^ — 2. 

J Length of x. For the CJS/BRE model J = (4^ — l)/3 — 1. 

K Length of 2 . For the CJS/BRE model K = 2^ - 2. 

A I X J matrix mapping x onto n, n = Ax. 

B K X J matrix mapping x onto 2 , 2 = Bz. 

0i Parameters in the model of z. 

02 Parameters in the conditional model of x given z. 


Band-Read Error Model: 

Pt Capture probability: the probability that an individual alive 

on occasion t is captured, t = 2,... ,T. 

A Survival probability: the probability that an individual is 

alive on occasion t + 1 given that it was alive on occasion t, 
t = l,...,T-l. 

a Correct identihcation rate: the probability that a captured 

individual is identified correctly. 
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B Proof of Theorem 1 


The proof that the sets M (x) dehned in Section 5 connect follows by (reverse) induction 
on the number of errors. Let x G and suppose that in this conhguration the number of 
errors on occasion t is greater than 0 : = 2) = = 3) > 0. Then 

there exist histories 1^2 G and 1/3 G -^ 3 t(x) such that > 0, Xi^^ > 0. We can 

subtract the move h(^'o, 1 ^ 1 , n' 2 , 1 ^ 3 ) to reach a new element with one less error on occasion 
t. Repeating this procedure, we reach an element in Tn with no errors. To complete the 
proof, we note that the only element in J^n with no errors is x^ where 

0 f if 1 / is observable 
I 0 otherwise. 

It follows that every history can be connected to x^ using moves from the dynamic Markov 
basis and that the partial sums remain inside the hber. Following the path in the reverse 
order by adding elements from }Ai{x) we can reach any other element in the hber. 
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Figure 1: Comparison of the chains sampling from the posterior distribution of the 
BRE model applied to the simulated data. The hgures trace the number of errors in x 
for the algorithms using the hxed basis (left) and dynamic basis (right). The grey dotted 
lines represent the true number of errors in the data set. 
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Figure 2: Results of the analysis of the queen snake data. The three panels present the 
estimate bias of the posterior means (Panel A) and the estimated width (Panel B) and 
coverage probability (Panel C) of the 95% credible interval for the survival probabilities for 
the three models described in Section 7. The different models are indicated by the shape of 
the plotting symbol. The rates of error are for Models 2 and 3 indicated by the color of the 
symbol. Coverage of 02 is not reported because the true parameter lies on the boundary of 
the parameter space. 
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